An Em Composite Likelihood Approach for Multistage Sampling of Family Data
نویسندگان
چکیده
Multistage sampling of family data is a common design in the field of genetic epidemiology, but appropriate methodologies for analyzing data collected under this design are still lacking. We propose here a statistical approach based on the composite likelihood framework. The composite likelihood is a weighted product of individual likelihoods corresponding to the sampling strata, where the weights are the inverse sampling probabilities of the families in each stratum. Our approach is developed for time-to-event data and can handle missing genetic covariates by using an Expectation-Maximization algorithm. A robust variance estimator is employed to account for the dependence of individuals within families. Our simulation studies have demonstrated the good properties of our approach in terms of consistency and efficiency of the genetic relative risk estimate in the presence of missing genotypes and under different multistage sampling designs. Finally, an application to a familial study of early-onset breast cancer shows the interest of our approach. While it confirms the important effect of the genes BRCA1 and BRCA2 in these families, it also shows that incorrect inference can be made about this effect if the sampling design is not properly taken into account.
منابع مشابه
A Bayesian Nominal Regression Model with Random Effects for Analysing Tehran Labor Force Survey Data
Large survey data are often accompanied by sampling weights that reflect the inequality probabilities for selecting samples in complex sampling. Sampling weights act as an expansion factor that, by scaling the subjects, turns the sample into a representative of the community. The quasi-maximum likelihood method is one of the approaches for considering sampling weights in the frequentist framewo...
متن کاملThe Variational Bayesian EM Algorithm for Incomplete Data: with Application to Scoring Graphical Model Structures
We present an efficient procedure for estimating the marginal likelihood of probabilistic models with latent variables or incomplete data. This method constructs and optimises a lower bound on the marginal likelihood using variational calculus, resulting in an iterative algorithm which generalises the EM algorithm by maintaining posterior distributions over both latent variables and parameters....
متن کاملComposite Likelihood Em Algorithm with Applications to Multivariate Hidden Markov Model
The method of composite likelihood is useful for dealing with estimation and inference in parametric models with high-dimensional data where the full likelihood approach renders computation intractable. We develop an extension of the EM algorithm in the framework of composite likelihood estimation given missing data or latent variables. We establish key theoretical properties of the composite l...
متن کاملSampling variability and estimates of density dependence: a composite-likelihood approach.
It is well known that sampling variability, if not properly taken into account, affects various ecologically important analyses. Statistical inference for stochastic population dynamics models is difficult when, in addition to the process error, there is also sampling error. The standard maximum-likelihood approach suffers from large computational burden. In this paper, I discuss an application...
متن کاملMultivariate Statistical Modeling with Survey Data
We describe an extension of the pseudo maximum likelihood (PML) estimation method developed by Skinner (1989) to multistage strati ̄ed cluster sampling designs, including ̄nite population and unequal probability sampling. We conduct simulation studies to evaluate the performance of the proposed estimator. The estimator is also compared to the general estimating equation (GEE) method for linear r...
متن کامل